Dynamic Script Generation
This document explains the dynamic script generation system that transforms natural language goals into safe, executable browser actions. It covers the end-to-end pipeline from goal interpretation to validated action plans, the templating and prompting system, parameter injection, safety validation, sandboxing considerations, error handling, and execution monitoring. It also documents how agent decisions map to generated code, including fallback strategies and error recovery mechanisms.
The dynamic script generation spans backend services, routing, prompts, sanitization utilities, and the browser extension runtime:
Backend API and service orchestration
Prompt templates and LLM invocation
Validation and sanitization
Extension-side action execution and content script helpers
Model schemas for request/response
/generate-script"] S["AgentService
generate_script()"] P["SCRIPT_PROMPT
ChatPromptTemplate"] V["agent_sanitizer
sanitize_json_actions()"] M1["GenerateScriptRequest"] M2["GenerateScriptResponse"] end subgraph "Agent Runtime" RA["ReactAgentService
generate_answer()"] G["GraphBuilder
LangGraph workflow"] end subgraph "Extension" E["executeActions.ts
executeBrowserActions()"] C["content.ts
performAction()/findElement()"] end R --> S S --> P S --> V S --> M2 RA --> G E --> C M1 --> R
Diagram sources
Section sources
Prompt template and LLM chain for action plan generation
Service orchestrating prompt assembly, LLM invocation, and validation
Sanitization and safety checks for generated JSON action plans
API router validating inputs and returning structured responses
Extension utilities for executing actions and content script helpers
Agent runtime for conversational reasoning and tool selection
Key responsibilities:
Goal interpretation and action planning
Parameter injection into prompt templates
Safety validation and error reporting
Execution coordination between backend and extension
Section sources
The system follows a clear separation of concerns:
Frontend sends a goal with optional DOM context and constraints.
Backend composes a prompt with DOM information and constraints, invokes the LLM, validates the JSON action plan, and returns a structured response.
The extension executes actions against the active tab, delegating DOM-specific actions to the content script.
Diagram sources
Prompt Template and Script Generation#
The prompt defines available actions (DOM manipulation and tab/window control), selector best practices, and critical rules for safe and effective automation.
The service composes a user prompt including goal, target URL, constraints, and a limited DOM snapshot of interactive elements.
The LLM produces a JSON action plan; the service extracts the content and passes it to the sanitizer.
Validation highlights:
Enforces presence of an actions array and per-action fields.
Validates action types and required parameters (e.g., selector for CLICK/TYPE/SELECT, url for OPEN_TAB/NAVIGATE).
Detects potentially dangerous EXECUTE_SCRIPT patterns.
Section sources
Action Planning Pipeline#
The pipeline stages:
Input assembly: goal, target_url, dom_structure, constraints.
Prompt construction: DOM summary and constraints embedded into a structured prompt.
LLM invocation: ChatPromptTemplate chained to the LLM client.
Validation: JSON parsing, structure checks, and safety checks.
Response shaping: ok flag, action_plan, and optional problems/error.
with DOM + constraints"] BuildPrompt --> InvokeLLM["Invoke LLM Chain"] InvokeLLM --> Parse["Parse JSON Response"] Parse --> Validate{"Valid JSON and Actions?"} Validate --> |No| ReportProblems["Collect Problems/Error"] Validate --> |Yes| ReturnPlan["Return Action Plan"] ReportProblems --> End(["End"]) ReturnPlan --> End
Diagram sources
Section sources
Script Templating and Parameter Injection#
The prompt template is a ChatPromptTemplate with a system message enumerating actions and rules, plus user content constructed from the goal, target URL, constraints, and DOM snapshot.
Parameter injection occurs by formatting the user prompt string with the provided inputs and limiting the number of interactive elements to control token usage.
Best practices reflected in the template:
Prefer explicit selectors and avoid chrome:// pages for DOM actions.
Prefer constructing full search URLs directly in OPEN_TAB.
Encourage atomic, clearly described steps.
Section sources
Safety Validation and Sandboxing#
Safety measures:
JSON validation ensures required fields and correct types.
EXECUTE_SCRIPT validation scans for dangerous patterns (e.g., eval-like constructs).
Tab control actions require mandatory fields (e.g., url for OPEN_TAB/NAVIGATE).
DOM actions are restricted to http/https contexts per prompt rules.
Sandboxing considerations:
EXECUTE_SCRIPT is allowed but subject to basic pattern checks; it runs in the extension’s content script context.
DOM actions are delegated to the content script via message passing, reducing direct exposure of unsafe patterns in the extension host.
Section sources
Execution Monitoring and Extension Integration#
The extension receives action lists and executes them sequentially with delays between actions.
DOM actions are sent to the active tab via messaging; the content script performs element queries and interactions.
The content script includes helpers for element finding and a small set of built-in actions for quick tasks.
Diagram sources
Section sources
Relationship Between Agent Decisions and Generated Code#
The prompt template encodes decision rules: choose tab control vs DOM actions based on intent, prefer direct navigation for searches, and use precise selectors.
The sanitizer enforces these rules at validation time, returning actionable feedback when plans violate constraints.
The extension faithfully executes the validated plan, with content script helpers enabling DOM interactions.
Fallback and recovery:
If validation fails, the API returns ok=false with problems; the caller can refine the goal or DOM context and retry.
For runtime errors during execution, the extension logs failures and continues to the next action after a delay.
Section sources
Examples of Generated Scripts#
Below are representative action plan structures produced by the system. These are conceptual examples derived from the prompt template and sanitizer rules.
Click an element:
type: “CLICK”
selector: “
” description: “
”
Type into an input:
type: “TYPE”
selector: “
” value: “
” description: “
”
Navigate to a search result page:
type: “OPEN_TAB”
url: “
” active: true
description: “
”
Combined workflow (open tab, wait, type, click):
OPEN_TAB with url and active
WAIT with time
TYPE with selector and value
CLICK with selector
These examples reflect the prompt’s preference for direct navigation URLs and atomic steps with clear descriptions.
Section sources
Agent Runtime and Conversational Planning#
While the dynamic script generation focuses on action plans, the agent runtime supports broader conversational reasoning:
A LangGraph workflow coordinates an agent node and tool execution.
The runtime normalizes messages and integrates tools, including a browser action tool that delegates to the script generation service.
Diagram sources
Section sources
Router depends on AgentService and models for request/response.
AgentService composes SCRIPT_PROMPT and invokes the LLM, then applies agent_sanitizer.
Extension utilities depend on browser APIs for tab management and content script messaging.
The agent runtime composes tools and a LangGraph workflow.
Diagram sources
Section sources
Limit DOM snapshots: The service truncates interactive elements to reduce token usage.
Batch execution delays: The extension introduces small delays between actions to prevent overwhelming the page.
Validation overhead: JSON parsing and safety checks occur synchronously; keep action plans concise and atomic.
Prompt caching: Consider caching repeated prompts or using a smaller subset of DOM data when feasible.
[No sources needed since this section provides general guidance]
Common issues and resolutions:
Invalid JSON or missing fields:
The sanitizer reports problems; refine the goal or provide a richer DOM context.
Missing required fields for actions:
Ensure selector for DOM actions and url for tab actions.
Dangerous EXECUTE_SCRIPT patterns:
Simplify or avoid custom scripts; rely on supported DOM actions.
Extension execution failures:
Check console logs for action errors; ensure the active tab is reachable and the selector is correct.
API validation errors:
The endpoint returns ok=false with problems; address reported issues and retry.
Section sources
The dynamic script generation system combines a structured prompt template, robust validation, and extension-based execution to safely transform natural language goals into executable browser actions. By enforcing strict validation rules, limiting DOM context, and using message-passing for DOM operations, the system balances flexibility with safety. The agent runtime complements this by enabling broader conversational planning, while the API and extension layers provide clear integration points for execution monitoring and error recovery.